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1 Introduction 


A cognitive approach to language asks both representational and computational 
questions. Our aim in our recent work, summarized in 'l'he Grammatical Basis 
of Linguistic Performance — is to discover both what our knowledge of language 
is—a question about representation- and how that knowledge is put to use--a 
question about computation. We argued—and we'll reinforce that argument 
here that we can gain a deeper understanding of why natural languages are 
built the way they are by considering how the problems of efficient parsing 
and learning connect lo the representation of grammars. We showed that if 
one is willing to make a few strong hut natural assumptions about constraints 
on human parsing abilities and how grammars arc used as parsers, then one 
can show, in part, why locality constraints like Subjacency must he a part of 
grammatical descriptions. Our assumptions were these: 

• Parsing is deterministic, in the sense that once information about the 
structure of a sentence is written down, it is never retracted. This means 
that the information about a sentence is monotonically preserved during 
analysis. 

9 Grammatical representations are embedded directly into parsers, without 
intervening derived predicates or multiplied-out rule systems. This is an 
assumption of transparency (Berwick and Weinberg 1984). 

• The human brain is finite. 

The assumptions about determinism and transparency are strong, but, as 
well see, natural. They are meant to be. Our explanatory punch works in 
direct proportion to the strength of the constraints: if we adopt a system where 
anything goes, then we cannot explain why languages are built one way rather 
than another. 

Naturally —and fortunately.this leaves the system of assumptions open to 

refutation. In a recent article to appear in Language and Cognitive Processes 
(1985), Janet Fodor takes issue with both the linguistic details behind the the¬ 
ory of grammar we adopt and with the assumptions of monotonicity and trans¬ 
parency. We believe that; eacli of these criticisms falls short, and well survey 
just what Fodor says as well as our own position, but before launching into a 
bill of particulars, it's worthwhile to step hack and survey the approach Fodor 
implicitly endorses. 

There’s a style of theory construction in A.I. that might he dubbed “univer¬ 
sal simulation.” The idea is to adopt the weakest possible set of assumptions 
about a computational process, for fear of being wrong. A lampoon version 
goes something like this: (i) every cognitive process is a computational pro¬ 
cess; (ii) Turing machines can simulate any computational process; so (iii) I’d 
hotter adopt a Turing machine as a model of this cognitive process, because 




otherwise I juay miss something. That 's sheer hyperhole, of course, but some¬ 
thing disturbingly close to this lies behind the embrace of liondetcnninisrn as 
a central feature of parsing models. The problem, as we specifically observe 
in our book and as Fodor echoes, is that since nondeterministk computation 
subsumes deterministic computation, one can always simulate the effect of the 
deterministic assumption simply by making the cost of nondeterminism very 
high. What Fodor fails to note is the flip side to this point.: one can iilways 
get the functional effect of recovery from failed determinism, such as garden 
paths, by adding recovery procedures to deterministic parsers. So why all the 
fuss? Don’t these two apparently opposed camps just merge into a gray middle 
ground? 

The difference is one of point of view and methodological stance. Forcing 
an essentially nondetcrministic procedure to be deterministic by adding cost 
to backup violates the spirit of nondetcrministic computation precisely in the 
same way that arbitrary backtracking would violate the spirit of determinism. 
We prefer to make the stronger.and more refutable -hypotheses about trans¬ 

parency and determinism. We’d argue that recovery from garden paths and 
near garden paths need not cause a deterministic parser to throw up its hands, 
but invokes quite particular, non-ad hoc reconstruction procedures that use the 
information built up about the parse in a (lctc.rmini.Ktic way. More about that 
later. The important point here is that we adopt the determinism requirement 
as a basic article -a “leading idea,’ 1 to be weakened only under duress and in 
quite limited, particular cases. In contrast, based on the same evidence, Fodor 
adopts nondeterminism as a leading idea. These different positions lead to quite 
different ways of thinking about parsing. For someone who endorses nondeter¬ 
minism, the hard part isn’t figuring out how parsing gets done--that’s easier, 
because we have more machinery at our disposal the hard part is figuring out 
what the constraints are and how to naturally enforce them. W r e must now be 
able to say why parsing isn’t done some other way that is just as easy to en¬ 
code using the extra machinery of nondeterminisni. Plainly the burden of proof 
here falls on Fodor’s shoulders; her position is the weaker one. One example 
of this point should suffice. Fodor argues that adding an extra memory cell or 
its functional equivalent to a transition network parser (e.g., a hold cell) makes 
parsing easy. Therefore, site concludes, it should be added. More strikingly, 
she comments: “Bfcrwickj and W[einberg] simply have to stipulate that their 
parser has no such facility.” (page 50; our emphasis). But since when does 
one have to stipulate the nonexistence of additional machinery? As Marcus 
(1980:140) says on this point, “What demands explanation and motivation is 
why a given facility is included in the model .... Thus, there is no reason to 
explain why a mechanism of only limited power has been implemented if it can 
be shown that it is enough to the job that is required.” What is more, by stick¬ 
ing to more restricted machinery, we can actually explain some of the structural 
characteristics of natural languages. 



Of course our loading idea may be incorrect. Then we will be led, regret¬ 
tably. to nondoterminism, to nontransparency, ami perhaps beyond. We say 
regrettably, because then we will he in a weaker position. Once the Pandora’s 
box of unlimited nondeterministic computation is opened, we can nail it shut 
only by importing constraints from other domains. Again, this may he possible; 
we cannot rule it out. Fodor hints at constraints on grammar size having to 
do with parsing/learnability hut we’ll see these arguments lack support. Sim¬ 
ply put, the search space of nomleterministically- and nontransparently-based 
theories is much vaster. We prefer to start with the much smaller world of 
determinism and work outwards. 

We were well aware' of this difficulty in our hook. That’s why we took great 
pains to distinguish between two versions of nondoterminism: (1) “true” nonde¬ 
terminism in parsing, where all interpretations are carried along simultaneously; 
and (2) “backtracking” nondoterminism, where all nondeterministic alternatives 
are explored one at a time. We carefully observed that our functional argu¬ 
ments bifurcating deterministic and nondeterministic parsing applied only to 
true nondeterminism. By thinking about this contrast, we were led to quite 
specific predictions about locality constraints in natural languages - predictions 
that are, as we show in our hook ;uul as we’ll underscore below, confirmed. 

This much said, we can turn to Fodor’s particular objections. As we noted 
earlier, they fall into two parts: objections to our predictions about which con¬ 
structions will obey Subjacency and which will not; and objections to our three 
key assumptions. As to the first set of objections, we’ll see that while Fodor’s 
more refined observations about what constructions obey Subjacency and what 
ones do not are correct, they in fact support our “leading idea” of determinism. 
The second set of objections center on the assumptions of determinism and its 
relationship to efficient payability, our “modular” parser design and the di¬ 
rect embedding of grammatical representations in the parser, and the restricted 
space for writing down grammatical operations. 

2 Determinism makes the right grammatical 
predictions 

Turning first to the grammatical predications of our model, Fodor’s interest¬ 
ing critique argues that our approach is both too strong and too weak. It is 
too strong in that our approach predicts parasitic gaps to be subject to Subja- 
cency. This is because their deterministic detection requires scanning the left 
context. 1 Nonetheless, wc claimed that the distribution of these categories wan 

1 To show this, Fodor cites examples where in order to know whether an adjunct clause with 
an ambiguous verb can take a ’parasitic gap object, eve must sec; whether the matrix clause 
contains a wh element in COMP. The relevant examples are contrasted in (a) and (b): 

(a) What did you cook without eating? 



not governed by Subjaccncy. 

Further, our approach is too weak because it cannot distinguish a subset 
of gapping constructions that Fodor shows obey locality from a class that does 
not. 2 

First, we will show that Fodor’s criticisms, while correct, deal with non* 
crucial assumptions of our analysis. The assumptions that replace them are 
fully compatible with our theory and the data cited by Fodor actually support 
our analysis in interesting ways. 3 

2.1 Parasitic gaps 

The most important thing to notice about our claim that parasitic gaps are not 
subject to Subjaccncy is that it is false. Chomsky (class lectures, 1984) provides 
the following examples showing that these constructions are in fact subject to 
this constraint: 


1. Who,- did your read a book about e,- to e,? 

2. Which man, did you interview e,- without reading up on e,-? 

*3. Which man,- did you interview e,- without reading [np [the file]y [g you 
made Cj on c,-]]? 

In (1), both gaps are subjacent both from the complementizer, and from each 
other. This is shown by both (4) and (5), where overt movement from both the 
parasitic and regular gap positions is acceptable. 

4. Who,- did you read a book about e,-. 

5. Who,- did you read the book (that Mary bought yesterday) to e,\ 

(b) Can you watch TV without eating? 

In the second example, catimj is unambiguously an intransitive verb, because there is no wh 
movement in the matrix clause. 

2 Before turning to those specific rases, let us dispense with one of Fodor’s more general 
criticisms; namely, since the solution adopted does not solve all cases of parsing ambiguity, 
it is dubious from the evolutionary perspective. In fact, this kind of compromise is typical 
of what one finds in natural selection. The evolutionary literature abounds with cases 
where selection has opted for solutions that either solve part of an evolutionary problem 
or created other problems. (See footnote 10 of Berwick and Weinberg 1982.) Indeed 
Gould (1083) camions us against adaptationists who theorized “a world of perfect design, 
not much different from that ‘concotcd’ by 18th century natural theologians who ‘proved’ 
God’s existence by the perfect architecture of organisms ... we do not inhabit a perfected 
world where natural selection ruthlessly scrutinizes all organic structures and then molds 
them for optimal utility.” (1983:155 150). 

3 The following is a very condensed version of Weinberg (forthcoming). 





Chomsky uses the contrast in (2) ;ui<l (3) to argue that parasitic gaps are 
bound to empty operators and are licit only if they are subjacent to these 
operators. These empty operators are interpreted as marks of predication and 
so must appear at the head of the adjunct clause. 4 Put in terms of our parsing 
model, we can use the presence of the overt operator to signal the presence of the 
‘‘real” gap. The placement of the empty operator is governed by the independent 
principles of A binding. The presence of the empty operator, in turn, can be 
used to signal the presence of the parasitic gap, if it is in a subjacent position. 5 
In addition, Chomsky assumes that the theory of government interacts with 
the theory of bounding in that only ungoverned nodes count for hounding. 
Therefore, we will assume that the empty operator is subjacent, to the real 
operator. 0 This analysis predicts that (3) is bad because, as a sign of predication 
between the relative clause and the head of the complex NP, the empty operator 
inside this relative must be hound to (coindexed with) the head. Coindexing 
the parasitic gap to this operator as well will result in an ill-formed structure, 
because quantifiers cannot he bound to two variables, as in (0). Neither the overt 
operator at the head of the sentence, nor the empty operator at the head of the 

4 Alternatively, following Aoun and Clark (1985), we can claim that empty operators count 
as A anaphora and so obey the locality conditions that apply to this class. Sec Weinberg 
(forthcoming) and Aoun, Hornstein, Lightfoot, and Weinberg (forthcoming) for details. 

This contrasts with Chomsky (1982) where parasitic gaps are considered underlying PROs. 
Pr, :y (1983) provides independent arguments showing this account of (he distribution of 

; -dtis' gaps is inadequate because it relies on the so-called functional definition of empty 
(as . cories. in addition, the earlier analysis would obviously not predict the observed distri¬ 
bution of the data, since PROs are typically not bound by operators, empty or otherwise. 

0 Chomsky must argue that all ungoverned nodes (not just NP or S) are bounding with 
respect to Subjacency. This is because he wants to rule out direct movement from an 
adjunct as in (a): 

(a) *Which article did John read a book before filing M 

In order to rule this out using Subjacency, he must claim that both PP and S count as 
bounding nodes. Moreover, he must use Subjacency to rule these cases out, because this 
is the only S-structure condition available to him and the bounding constraint in these 
constructions is an S-structure phenomenon, as shown by the grammatically of (b): \ 

(b) Who read a book before filing which article? 

In Weinberg (forthcoming) and in Wahl (forthcoming) it is argued that the requirement 
of lexical proper government in Chomsky’s ECI’ actually applies a the level of phonetic form 
(PF). This allows us to rule out a case like (a) by elairningthaf, the trace in the COMP of 
the adjunct is not properly governed, as shown in the structure (c): 

(c) *[g Which article, [did John read a book [before [g e, [PRO filing e,j][] 

Therefore, we can maintain the position that only S and NP count for the bound¬ 
ing system. Thus the empty operator is subjacent to the real operator in parasitic gap 
constructions. 
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adjunct, axe subjacent to the gap, and so they cannot license it. Therefore this 
structure is ruled out. This contrasts with (2), where every trace is subjacent 
to the operator that licenses it, as shown in (7). 

*0. Which man,- [s did you [vp., interview e,][pp without [OF2j [s PRO 

reading [xp tin; file, [s< OP,- [s that you made on/ e,:]]]]]]] 

7. Who/ [s did you [yp interview e/j[pp without [s' OP/ [PRO reading up 

on c,■]]]]? 

Thus in fact, Fodor is correct, in claiming that our analysis should predict that 
parasitic gaps are governed by Subjacency and we were mistaken when we 
claimed in our book that it did not. Rut we were all incorrect in believing 
that tlm constraint did not hold. Assuming that we can show that the creation 
of empty operators causes no problems for a deterministic system, we can use 
their presence to license parasitic gaps in the appropriate structures. Thus we 
can nndee the parsing model predict the properties of this construction in a 
straightforward and independently motivated way. It is important to note at 
this point that we are not changing assumptions in an ad hoc way simply to 
model the facts. Tin' problem with our first attempt was that we did not follow 
the logic of our predictions clearly. The inode] actually predicts that parasitic 
gaps should be governed by Subjacency, as Fodor notes in her article. In the 
next section, we will show that the model is non-ad hoc in other ways, in that it 
or something like this model is needed to solve a general parsing problem that 
is independent of the determinism issue. 

In this section, we present an algorithm to create empty operators that is 
also compatible with a deterministic approach. Note that the case of empty 
operators in adjuncts is similar to the ease of factive Noun Phrases cited by 
Fodm in her criticism of Marcus. As in factives, the presence of the overt 
operator makes parasitic gaps possible in adjunct positions, but it does not 
make them obligatory in those structures. Consider (8) (10). 

8 . Who did you meet without greeting. 

9. Who did you meet without greeting him. 

10. Who did you meet without clearing the rendezvous with security. 

In a case like (8), the parser must place an empty operator in the comple¬ 
mentizer of the adjunct phrase in order to hind the empty parasitic object of the 
verb greeting. !n (9) and (10) by contrast, we do not want to place an empty 
operator in this position, because there is no parasitic gap in the adjunct for 
the operator to hind. 7 In (9) the parasitic gap is filled by a pronoun and in (1), 


7 If these operators are available at ;il! stages of comprehension then the fact that the empty 
operator has no variable to bind should make the sentence as bad as (a): 





there is no corresponding gap position at a]]. Because? of the possibility of suc¬ 
cessive cyclic movement however, the parasitic gap can be indefinitely far away 
on the surface from the empty operator position. A deterministic parser with 
limited lookahead will not be able to wait for the disambiguating right context. 8 
Therefore, there will he certain cases it will incorrectly place an empty operator 
in the adjunct's COMP. 

Fodor implies that these farts pose a problem solely for deterministic parser, 
suggesting that a nondoterministir solution is called for. In fact, the determin¬ 
istic /nondeterministie issue is beside the point. If the distinction is between 
a deterministic parser and a nondetenninistie parser that backtracks (Fodor’s 
choice), then both will have problems because they both at least superficially 
predict that such cases cause people to have noticeable difficulties in compre¬ 
hending these sorts of sentences. But none of (8) (10) are difficult to under¬ 
stand. 

The noudeterministic parsers with backtracking that Fodor cites divide cases 
of possible parser error into three types: 

(a) Canes that are locally ambiguous but cause the parser no difficulty. Here 
it is claimed that either the backtracking needed to transform an incorrect 
false start into a correct analysis is so minor that it is not associated with a 
computational cost, or that these parsers use an exact analog of a deterministic 
parser’s local buffer solution and thus always make the right choice. Some 
examples of this kind of rase are given in (11). 

lla. John believes Bill. 

llb. John believes Bill is a fool. 

Even if the parser mistakenly hypothesized that the subject of the embedded 
infinitival was the direct object of the verb believe, the backtracking needed to 
insert the infinitival S marker between it and verb is minor and a nondeter- 
ministic parser might be able to correct its mistake in a way that is relatively 
cost-free. 9 

In contrast, there are cases that require more extensive backtracking over 
essentially unbounded distances. These eases can be divided into two types. 

(1>) Cases for which people register a strong preference for one of the possible 
analyses (even when pragmatic biasing points to the other choice, but where 

(a) Who hid John meet Mary? 


8 Thc requirement that lookahead be limited is crucial because, as Marcus (1980) notes, a 
deterministic parser with unlimited lookahead could well turn out to be able to simulate a 
noudeterministic machine. 

n Note that this is true even for a deterministic parser, since we need only add a new piece 
of information. See the next section for a related example. 
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both loadings are eventually available). Ah example of this rase is shown in 
(12), where, as Fodor mentions, there is an initial preference for the reading 
where who is taken to be the subject of an embedded clause. 

12. Who,: did the little girl beg to sing those stupid French songs (for) e,? 


(c) Cases of conscious garden paths where one. reading is difficult. These are 
cases where the alternative has to be pointed out, even if it is the oidy reading 
resulting in a grammatical sentence. These include the classic sentences as in 
(13): 


13. The horse raced past the barn fell. 

The processing load here might be compatible with a backtracking approach 
if if is assumed that backtracking over long distances is computational costly. 
(It ran often be difficult to assess these effects in a backtracking model; see the 
next section.) The extra burden imposed by true garden paths is a complex 
effect that is partly lexical, partly structural, and exacerbated by distance (in 
terms of number of alternative, but unconsidered pathways). 

Cases like (8) (10) cause problems for the backtracking approach because 
they break the association between the extent of backtracking necessary to cor¬ 
rect false starts and perceived sentence complexity. None of the examples in 
(8)-(10) produce processing complexity. This shows that there is not even a 
preference for adjuncts with or without parasitic gaps. Whatever the first hy¬ 
pothesis of the (determinist ic or backtracking) parser- whether it inserts an 
empty operator in the adjunct's complementizer or not one of the structures 
is incorrectly predicted to bo difficult to process because of extensive backtrack¬ 
ing from the site of the disambiguating parasitic gap or end of the adjunct 
needed to correct the mistake. (Ha) mid (Mb) show that no extra processing 
complexity is observed even in cases where the disambiguating right context is 
very far away from the point where the decision about whether to insert an 
empty operator must be made. 

14a. Who did you search for without telling Sue to convince Bill to ask 
Harry to come with yon? 

14b. Who did you search for without telling Bill to ask Sue to inform 
Harry that you would meet? 

It seems then that these kind of sentences are problems for both deterministic 
and nondeterministic (backtracking) parsers. We could solve them if we could 
design an algorithm in which the semantic component simply didn’t interpret 
empty operators unless they were eventually bound to elements in argument 
positions. Since these elements have no phonetic content, if they received no 



semantic interpretation, it would be as if these elements never existed. 10 In 
that, ease we could insert the empty operator in all sentences, but we would be 
sure 1 to be right because an unbound empty operator would simply be ignored, 
because it is invisible. In fact the two stage parsing model discussed in our book 
provides just such a mechanism. 

We argued on conceptual and psycholinguist.ic grounds that the natural lan¬ 
guage processor was a two stage mechanism. The first stage dealt with tree 
expansion and the second dealt with indexation. In addition to having a dif¬ 
ferent function, the second stage worked on a different representation. During 
the first stage, the completion of a category signaled the parser to shunt the 
category’s daughter into a separate stack, which we called the Propositional 
Node 1 Stack (PNS). The intuition behind this shunting was that once a cate¬ 
gory's thematic role was established from its position in the syntactic tree, the 
parser wouldn’t need to retain many of the details of syntactic structure. We 
showed that elements in the same c-command domain are not put in the PNS 
until all categories in the domain are complete. This algorithm allowed the 
parser to correctly compute c-command relations between categories. This was 
crucial since these relations govern the application of the binding operations 
on the previously expanded tree. Pursuing the intuition that the PNS was a 
representation concerned with purely semantic aspects of the interpretation, we 
placed a semantic visibility condition on the categories appearing in this com¬ 
ponent. We claimed that to be interpreted by the semantic component (PNS), 
a category had to have semantic features. These were the features that allowed 
a Noun Phrase to either denote an individual or a set of individuals of allowed 
a quantifier to delimit a scope. 11 Assuming a category had such features it 
would be given a “referential index” and be visible in the PNS. If a category did 
not intrinsically have such features, it could obtain a referential index by be¬ 
ing linked to an element that did. 12 Given the shunting procedure, an (dement 
would have to be in the same c-command domain as its antecedent in order 
to receive a referential index before being shunted into the PNS. If an element 
did not receive an index before shunting, it would become invisible and receive 
no interpretation. This allowed us to provide a principled explanation for the 
fact that grammatical conditions specifying c-commanding antecedents seem to 

10 An alternative would obviously bo to come up with an analysis that did not posit empty 
operators in these and related eases. Such an account is difficult to conceive of, because we 
would also have to account for the aubjacency effects that these constructions exhibit. By 
this we do not mean corning up with an alternative functional explanation for Subjacency 
in these cases. We mean allowing the parser (or the grammar) to distinguish those cases 
that are grammatical from those that, do not obey the constraint,. 

11 Examples of categories with intrinsic semantic features are proper names like John, pro¬ 
nouns like him wh phrases like what or which man, 

12 Categories that have no intrinsic semantic features and so can receive referential indices 
only by linking are bound anaphora like each other or herself , empty NP and wh traces, and 
certain non-wh quantified expressions. See Weinberg (forthcoming) for details. 



apply only to categories with no independent referential stains. 13 Chomsky 
(1981 and 1984 class lectures) has suggested that association with a thematic 
(theta) role is also a necessary condition on visibility for semantic interpretation 
roles. We will adopt Chomsky’s suggestion and state the combined condition 
on visibility as follows. 

15. (Visibility Condition) To be visible in the PNS, an element must 
be associated with a theta role (either by occupying a theta position or 
binding an element in a theta position) and must have referential features 
(features that either designate an individual or set of individuals or that 
delimit a range). 

We will now show that the independently motivated shunting procedure and 
visibility conditions give an account of empty operators that explains why they 
cause no processing difficulties. 

Let us reconsider sentences (8)- (10). In (8), the parser recognizes that part 
of the sentence is an adjunct phrase. This signals the possibility of a parasitic 
gap in the subsequent structure. The parser therefore inserts an empty operator 
in the COMP position, as shown in (16): 

16. Who,- did you meet e,- without [g OP ; -... 

If the parser subsequently finds a gap position in a subjacent domain, it can 
create a trace and bind the operator to it, thus associating the operator with a 
theta position, as in (17). 

17. Who,- did you meet e,- without [OPy [g greeting ey]] 

Before shunting into the propositional node stack, the operator must locate 
an antecedent in the c-command domain with a referential index. If it does not 
find one, then neither it nor its trace will be interpreted, because even though 
they are associated with a theta role, they are not associated with a category 
that delimits a range. In this ease the overt operator who is present in the 
c-command domain, so both the empty operator and the trace can receive the 
category’s referential index (i) and so be interpreted in the PNS. 

Compare this to (18). In (18) below, the parser will also detect an adjunct. 
It will not detect an overt operator, and so no empty operator will he cre¬ 
ated. Since there is no empty operator, no parasitic gap will be created in this 
structure. 

18. Did you watch the movie without [g OP_, [$ eating )] 


* ’.Sr. Berwick and Weinberg (1984, pp. 173 182) far the conceptual arguments ami Weinberg 
(forthcoming) anti Weinberg and Garrett (forthcoming) for psycholingnistic results and 
additional consequences of this approach. 



In cases like (9) and (10) above, the adjunct and overt operator again triggers 
the creation of an empty operator. Since there is no gap in the adjunct phrase, 
the operator is not associated with a theta role. Therefore, even though there is 
an overt operator to link with, the empty operator does not meet the criterion 
for visibility at PNS and so is not interpreted. 14 Since empty operators are not 
interpreted unless both conditions on visibility are met, a deterministic parser 
can always create these categories because they can never force it to simulate 
nondeterminism either by backtracking or parallelism in order to correct for 
past mistakes. Note that, this solution will only work for empty operators. Lex¬ 
ically specified elements will receive a phonetic interpreration but no semantic 
interpretation, a situation that will lead to unacceptability. An empty element 
with no semantic features, however, is neither semantically nor phonetically 
interpreted and so simply plays no role in the interpretation of the sentence. 15 

The astute reader will have noted an apparent problem created by this so¬ 
lution. Why, one might ask, if empty categories can become invisible at later 
stages of interpretation, must we cue their creation to the presence of overt op- 

14 Til)a approach will also handle empty operators in tough movement, topicalination, relative 
clauses, and the factive NPs that. Fodor discusses in her criticism of Marcus. As should be 
obvious, since all these structures also involve predication between a phrase and a head, 
topic, or adjective phrase, exactly the same logic applies. See Weinberg (forthcoming) for 
details. 

^Throughout this account, we have assumed, contra Chomsky, that the empty operator is 
subjacent ot, lie real operator. However, this assumption is not crucial, and remains to be 
verified (or falsified) by some fairly subtle empirical facts. To show this, let, us assume (with 
Chomsky) that empty operators are not in fact subjacent to real operators. Then we must 
predict that the possible presence of an empty operator is queued solely by the presence of 
the adjunct structure. So in a case like (a), 

(a) Did you catch a fish without eating? 

the parser couldn’t mistakenly output a structure like (b): 

(b) Did you catch a fish jpp without [ OP, [PRO eating ey]]] 

Tlie empty operator and parasitic gap, having no referential indices, would disappear 
from the semantic component’s representation. However, the case features on the parasitic 
gap would make it visible in PF. In fact, some speakers report an initial bias towards 
treating cat as a transitive verb in these structures, and thus say that the sentence sounds 
unacceptable. This bias interestingly does not cross over to structures where this verb is 
not, in an adjunct: 

(c) Did you think that Harry told Mary that he expected to eat? 

If these sentences reflect true biases, then an algorithm based on Chomsky’s definition of 
Subjacency would seem more appropriate. Such an account, would be fully compatible with 
our approach at the conceptual level. We have noted cases in our book where, in order to 
be specifiable using terms licensed by the grammar, the Subjacency condition is in some 
sense “stricter” than the parser’s needs. Here wc have a case where a parser whose rules 
are written using the grammar’s predicates will sometimes make mistakes. The prediction 
is that people will make the same mistakes. The facts here, however, are quite subtle, and 
since either alternative is compatible with our approach, we leave the question of whether 
to place the Subjacency requirements on the empty operator open. 





orators? Tbo cases that motivated the account in the first place were those in 
which the local suhcategorization of a. verb was indeterminate. Before positing 
an empty element after such a verb, we claimed that we had to make sure that 
an actual operator was present in the previously analyzed structure. However, 
given our present approach, one might be tempted to argue that if a verb that 
can bo optionally transitive turns out to be used intransitively in a given struc¬ 
ture, the gap will simply not be associated with an operator and so become 
invisible in the PNS. This seems to dash the motivation for restrictions on left 
context, crucial for the functional motivation of Subjacency in the first place. 
But it is only elements with no phonetic features that can escape uuacceptability 
if they are not semantically interpreted. Since wh elements have case features, 16 
they will be visible in the phonological component. 17 This makes certain pre¬ 
dictions about the applicability of Subjaceny to NP movement. As noted in 
Lasnik and Saito (1984), all the cases where we seem to need Subjacency to rule 
out unacceptable NP movements are actually also ruled out redundantly by the 
Empty Category Principle. Under our approach, we predict that NP movement 
should not be governed by Subjacency, thus ruling out this redundancy, always 
a welcome result. 18 

Looking at the distribution of parasitic gaps from the parsing perspective 
allows us to supplement Chomsky’s analysis in important ways. It allows us 
to derive the fact that parasitic gaps must be licensed at S-stmcturc. That is, 
we derive as a theorem the fact that quantifiers and wh operators that move to 
COMP or some other prc-S position at LF do not create acceptable parasitic 
gap structures, as shown by examples (19a) and (19b). 

*19a. [s Yon [yp [yp met who;] [pp without greeting e,-]]] 

,6 Scc Chomsky (1981) for justification of this assumption. 

I7 Sce Aoun and Lightfoot (198-1) for discussion. 

18 Scc Weinberg (forthcoming) for details. Note that the non-government of NP movement 
by Subjacency reinforces the point made in Berwick and Weinberg (1984)—namely, that 
Subjacency governs a natural class from the parsing perspective. The example just given 
shows that Subjacency only governs a subset of the movement constructions, the gapping 
examples discussed later on in this section show that Subjacency governs a subset of the 
deletion constructions. From a grammatical viewpoint, this is an entirely unnatural result. 

This approach also makes sense of pome preliminary results reported by Frazier (1984 
Nels conference) and cited by Fodor in her article. Frazier claims that eye movement tasks 
suggest that subjects try to fill gaps using operators that are not subjacent to them, if the 
verbs governing the gap position are strongly subcategorized for direct objects. The cases 
are like those in (a): 

a. *What; did [the girl [s who won e, receive e;] 

Given our approach we might claim that the gap inside the island is created on the basis 
of the empty operator in the COMP of the relative COMP. The fact that subjects seem to 
look back to the overt wh element, is compatible with our approach if we claim that this is 
the result of the attempt to bind this operator (an operation not governed by Subjacency) 
to the overt operator. 
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*19b. [Everyone [vp [vp met someone,- ][pp without greeting e,-]j] 




We know independently that that parasitic gap constructions are not licit in 
the real gap occurs in Subject, position. 10 In addition, if our analysis is correct, 
the overt operator must occur in a c-commanding COMP. As mentioned, the 
c-command requirement is ensured by the shunting design of the parser. If an 
clement, does not c-command a category if is not visible to it and so cannot 
be used to create that category as we expand the parse tree. Neither the wh 
element, nor the quantifier in (19a) or (191>) c-commands the adjuncts contain¬ 
ing the parasitic gaps. Given the above account, there will he no binder to 
give referential features to the empty operator in the COMPs of these adjuncts 
and thus neither they nor their traces will be interpreted in the PNS. Given 
that the input for parsing decisions is the S-struetnrc of the sentence, the subse¬ 
quent movement of a category to a c-commanding position at a post S-structure 
level cannot help the parser decide how to expand the parse tree. Our pars¬ 
ing theory can derive both the fact that Subjaccncy is an S-structure property 
and the Subjacent government of parasitic gaps along with their licensing at 
S-structure— the central properties of the construction. 


2.2 Gapping constructions 

Fodor’s next criticism deals with our analysis of gapping. She is correct in claim¬ 
ing that our treatment, does not distinguish the subset- of gapping constructions 
that obey bounding conditions from those that do not. As she points out, es¬ 
cape from bounding correlates with the appearance of an auxiliary marker in 
the pregap position. (20) and (21) illustrate. 


20a. Mary fishes in the ocean and Harry in the sea. 

*20b. Mary fishes in the ocean and I think Harry in the sea. 


21a. Mary has fished in the ocean and Harry has in the sea. 


21b. Mary lias fished in the ocean and I think Harry has in the sea. 

In our previous analysis we claimed that bounding was expected in gapping 
constructions because the complements of the gapped verb had to be correctly 
attached in the VP internal or external position. Correct attachment depends 
on the properties of the verb. Since ;m overt verb is not available to direct 
the parser in a gapped constituent, we predicted that deterministic attachment 
of these complements required a look at left context (some previous conjunct 

19 See Chomsky (1982). 
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containing an overt verb). Given the usual requirement of bounded access to 
this left context, the bound constraint on these constructions followed. Since the 
parser faces the same problem in both types of gapping constructions, Fodor is 
right; in claiming that we are incorrectly led to the conclusion that the presence 
or absence of an auxiliary marker in the gapped constituent should not influence 
the application of the constraint. Therefore, in countering this argument we 
must show that complement, attachment of PPs does not require access to left 
context, but that there are other properties of gapping constructions that require 
this access only in cases where no overt auxiliary precedes the gapping site. 
Let's start with the second point first. Consider the following examples. 

22. [g I consider [$ Bill [vp to be a fool]]] 

23. [s I consider [s Bill [np a fool]]] 

In (22) the embedded clause is an infinitival with a YP predicate and in (23) 
it is a small clause with an NP predicate. 20 The head of the VP predicate in 
(22) can be gapped, as shown in (24). 

24. [ s John believes [ s FRED is a FOOL] and [s' HENRY [ VP [ v 0] AN 
IDIOT]]] 21 

Fodor (1975) lias shown that (24) actually involves two different deletion 
rules. Main Verb Deletion eliminates the verbal be form and Tense Deletion 
removes the associated tense. Cast in parsing terms, the interpretation of the 
second conjunct involves expanding the parse tree with both an empty tense 
morpheme and tut empty verb. Note however that the surface string in the sec¬ 
ond conjunct is locally ambiguous and could be expanded as a gapped structure 
or as a small clause. If we chose the small clause alternative, the sentence would 
be ruled out because believe does not take small clause complements, as shown 
by (25). 

*25. [ s I believe [y John [np a fool]] 

The only way that we can determine the proper expansion of the second 
conjunct in a case like (24) is by rescanning the left conjunct. Again we have 
a case where a deterministic tree expansion involves left context examination. 

S0 Tbe structure of small clauses is the subject of some controversy. Chomsky (1981) following 
Stowcll (1981) argues that embedded categories like Hill a fool formed sentential comple¬ 
ments (in this case with the structure [ A p [ N p John) a fool]). Williams (1983) argues that 
these categories do not form a constituent, and that they are properly analyzed as [... [vp 
John] [np a fool]...]. Hornstein and Lightfoot (forthcoming) argue against. Williams’s anal¬ 
ysis and in favor of a modified, version of the Chomsky Stowcll approach. The only point 
relevant to this argument, however, is that the predicates of small clauses arc not VPs. 

21 We follow Fodor’s convention of indicating the placement of heavy stress on a word by 
capitalization. 




Given our usual logic, we must ensure that we will never have to look at an 
unbounded stretch of left context. Therefore, we predict, that cases involving 
tense deletion should obey bounding exactly what Fodor demonstrates. As 
additional evidence, consider (2Ga). If the parsing version of tense deletion is 
governed by bounding, then we predict, that the small clause analysis will be 
the only permissible expansion of the embedded clause in the second conjunct. 
Since believe doesn't take small clauses we predict the unaeceplability of the 
structure, in contrast with the acceptable (26b). 

*26a. 1 think Fred is a fool and Stic believes John stupid. 


26b. I think Fred is a fool and Sue believes John is stupid. 

In contrast, cases that, involve only main verb deletion will never create the 
same kind of ambiguous situations. This is because the presence of an overt 
auxiliary unambiguously signals that a verb phrase must follow. One never 
finds overt auxiliaries in small clauses. Since the parser will always be right if 
it expands the phrase after an overt auxiliary as an empty headed VP, it will 
never have to scan the left conjunct. In a case like (27) it simply uses the locally 
available overt auxiliary to decide about subsequent expansion of the tree. 

27. John has fished in the ocean and Bill has in the sea. 

Since wo never need to examine left context when the auxiliary remains 
in the surface string, we do not expect Main Verb Deletion to obey bounding 
constraints. This is in fact what Fodor observes. 

Tins account lias another virtue. The information provided by the left con¬ 
text to resolve the ambiguous cases will be available at the time the parser is 
confronted with the ambiguous material of the second conjunct. This contrasts 
with our previous analysis where, as Fodor correctly notes, proper identification 
of a verb's subcategorization and selectional properties demands access to the 
actual verb of the previous conjunct. Unfortunately, our parser will have al¬ 
ready shunted this material into the PNS representation. Our parser shunts at 
the end of c-command domains leaving only immediate daughters of the com¬ 
pleted constituent available as information for future parsing decisions. This 
is no problem for our new analysis because we distinguish small clauses from 
gapped constituents merely by looking at previous conjunct^ for the presence of 
a tensed auxiliary. If we treat sentences as maximal projections of INFLcction 
(Chomsky 1981) and if we assume that, lexical information about the head of a 
category is projected from that head to its most maximal projection, then the 
relevant information will percolate up to the highest S node on the tree and 
thus be available to the parse for expansion decisions. 22 


^Projection to the most maximal projection is supported by movement of postverbal Subjects 
in Italian. Since these elements occur in structures like (a) we must insure that the verb 
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Consider again a structure like (24), repeated as (28), with irrelevant details 
omitted. 

By the time the parser reaches the locally ambiguous second conjunct, the 
first conjunct will have been shunted to the PNS. Thus information contained 
in this conjunct, will not be available for decisions about tree expansion. This 
causes no trouble because we see that the tensed character of the first conjunct 
can be read off the highest INFL projection that c-commands and is bonndndly 
far from the INFL (INFL’) of the next conjunct. If the first conjunct was a small 
clause, then the O-inflcction would also percolate up to the maximal S node. This 
is all the information the parser needs to correctly expand the tree of the second 
conjunct. If the previous conjunct contains a tensed or infinitival inflection, the 

can transmit its features to the maximal VP in order for the trace of the postverbal Subject 
to satisfy the conditions on proper government imposed by the ECP. 

(a) 
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parser expands the conjunct as a gapped structure. If the previous conjunct 
contains a 0 inflection, then the parser expands the ambiguous structure as a 
small clause. This analysis makes the interesting prediction that if Ss instead 
of 8 ! s are conjoined, tense deletion should he unacceptable. Since S is not a 
projection of INFL, conjunction of Ss would not allow percolation of information 
beyond the first conjunct in a structure like (28). 23 Since expansion as a tensed 
structure is conditioned by the presence of an overt auxiliary in the previous 
conjunct, the parser will not be able to apply the tense deletion rule. This is 
confirmed by comparing (29a) and (29b), where we have conjoined S’s, with 
(29c) and (29d), where we have conjoined Ss. 


29a. Tiiat Frank would iiit Sam and Bill would hit Harry surprised me. 

29b. That [s Bill would hit Sam] and [g Frank [infl 1 (0) [vp [v$ ]Harry] 
surprised me]] 


29c. That Frank would hit Sam and that Bill would hit Harry surprised 
me. 

*29d. [g [g That [g Frank would hit Sam] and [g that [g Bill [infl<0][v0] 
Harry]] surprised mc.j] 

As predicted, Main verb deletion can apply in both conjoined S and Ss as 
shown in (30). 

30a. That Frank would hit Sam and Bill would Harry surprised me. 

30 b. That Frank would bit Sam and that Bill would Harry surprised me. 

Thus this approach correctly distinguishes the two cases of gapping. 

Returning to our first problem, we must show why the problem of complement- 
vs. adjunct attachment, which applies in both types of gapping, does not force 
the parser to look at left context, thus incorrectly predicting that bounding con¬ 
straints apply to both kinds of gapping. The treatment in our book assumed 
that the semantic interpretation of adjuncts and complements proceeded in es¬ 
sentially the same way, by reading off tree structure. If wc assume this, then it 
follows that a deterministic parser must attach PPs and other adjunct phrases 
as they are attached by the grammar, in order to carry out semantic interpre¬ 
tation. However, this assumption is highly dubious. As Miller and Chomksy 
(1963), Marcus (1980), and many others note, in certain cases, strings of adjunct 
phrases can occur in potentially unlimited configruations. Thus a sequence like 
the man in the house by the river by the woods near the town can have any of 
the following intepretations: 


23 Seo Zubizarrctta (1982) «md Stowell (1981). 
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[in the house][by the river [by the woods]][near the town]. 

[in the house [by the river][by the woods [near the town]]] 

[in the house [by the river [by the woods [near the town]]]] 

A parser that had to do semantic intepretation from tree structure would 
find itself in an exponential regress in such cases. In order to figure out which 
interpretation to give the sentence, it would have to compute the correct syntac¬ 
tic structure, but in order to do this it has to compute all the possible patterns 
compatible with this string, and then see which one it “means to say.” This will 
cause an exponential slowdown in the parsing algorithm, if ail trees must be 
explicitly reconstructed. One classic solution proposed by these authors is that 
adjunct phrases that can he ambiguous (either between adjunct and complement 
readings or between various adjunct readings] should he parsed essentially as 
flat structures. Semantic subroutines can then come in later and decide between 
the possible readings: a procedure that allows us to maintain efficient parsing. 

Put in the context of the gapping constructions, if a parser cannot figure out 
where an adjunct is attached from the local context, it can simply attach it as a 
flat structure to the lowest node in the parse tree. Then, independently needed 
semantic routines will give this phrase its appropriate semantic interpretation. 
Thus the attachment of adjunct PPs in neither type of gapping can force the 
parser to scan left context. Therefore, the attachment of adjunct phrases does 
not incorrectly predict bounding effects in Main Verb Deletion. 


3 Objections to basic assumptions: transparency 
and determinism 

3.1 What is nondeterminism? 

We'll first analyze the distinction between determinism and nondeterminism, 
and how Fodor views that distinction. Fodor makes two points: 

1. A nondetcrministic parser, just like a deterministic one, could benefit from 
locality restrictions—if the cost of backup is high. 

2. A deterministic parser cannot recover from error, and so cannot comport 
with what is known about human processing of sentences. 

Nondeterministic parsers do not reflect processing complexity 

Let’s take these points in turn. First, as we said earlier, one must distinguish 
between two versions of the nondeterminism hypothesis: true nondeterminism, 
where all possibilities are explored in parallel; and simulated nondeterniinism, 
where one possible parse is explored at a time, and backup occurs if one line 
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of attack fails. Only the first version makes the noiidetevministic/detcmiinistic 
parsing distinction cleareut, and this is the one we chose for comparison. The 
second version of nondeterminism is just like tin; Marcus model in that a single, 
particular sequence of parsing decisions is made as we move through the sen¬ 
tence, left-to-right. It is unlike a deterministic model in that revisions in that 
sequence of decision are assumed to occur all the time. 

Fodor does not make the elearent choice. Instead, she opts for a determin¬ 
istic, one-path-at-a-time simulation of true nondeterminism. This position is 
quite weak, because, as Fodor notes, one can turn this simulation into the func¬ 
tional equivalent of a deterministic parse simply by making the cost of revising 
decisions very high: 

Every point that M. makes could have been made just as well within 
the coiitext of a nondeterministic parser which cared about efficiency. 
(Fodor, page 18) 

Imposing a cost metric on backup, then, gives us more flexibility. But, is 
this too much flexibility? There are three basic options. If we say that backup 
costs are zero, then we have in effect the case of true nondet erminism; if we say 
that backup costs arc infinite, we have a Marcus model. If we make the costs 
somewhere in between zero and infinite, we get a middle view. 

Fodor takes this as a virtue: all bases are covered. But is this so? Do we 
need at least this three-way split? If one is going to impose a constraint on a 
weaker system that lias the functional effect of determinism, it would seem just 
as sensible to start with that constraint in the first place: assume the machine 
is deterministic, and see if the required psycholinguistic complexity options can 
be obtained this way. Cut ting up the constraints this way makes a difference. A 
“cost" metric is the weaker position, because we must justify the metric we use 
somehow. That is, we must, support, both the assumption of nondeterminism 
and a particular cost metric. In contrast, a deterministic machine is directly 
built to act as if backtracking costs arc' very high. There is no separate cost 
metric device in the Marcus parser; therefore we need not justify one. All we 
need to justify is the assumption of determinism, which we must do in any case. 

There could be other grounds for the flexibility allowed by a cost-metric 
addition to the nondeterministic model. In a footnote to her paper, Fodor tries 
to turn the cost-metric model to her advantage, as a way to simulate observed 
human sentence processing. Fodor attempts to equate backtracking cost with 
processing difficulty: 

But it could very well be that that the really severe garden path sen¬ 
tences ... are those for which all the wrong(=correct,) initial choices 
are reconsidered before the one that was truly at fault. This is 
where the 2" figure would approach a realistic estimate of parsing 
time, and it would nicely account for the inordinate difficulty of these 
sentences .... Thus the striking differences that, have been observed 


ill tlie processing difficulty of natural language sentences are per¬ 
fectly consistent with the mathematical results for nondeterministic 
parsing with online backup. 

Fodor is claiming that a garden path sentence such as the horse raced past the 
barn fell demands exponential parsing time because of backup, while relatively 
easier “nongarden path” sentences (such as they told the students that John liked 
that Bill would leave) do not. But it is easy to see that both of these require 
the same amount of backtracking. The problem is that in a direct backtracking 
implementation, backup occurs all the time, even on simple sentences. For the 
first sentence, a backtracking parser must make a decision just before raced, 
between a relative clause and a VP. Assuming frequency preference, it takes 
the VP reading, which fails when fell is encountered. Now it must backup. 
We'll assume the hist previous choice point was before that John. In fact, 
this is not correct. In a pure backtracking parser, wo would have to unwind 
to all intermediate choice points: there might be a relative clause after barn', 
there might be an NP object after raced ; and so on. Finally, we arrive at the 
choice at, raced and can continue. If the machine can inspect the current word 
it is scanning, two or three choice points arc involved. 24 More backtracking 
correlates with processing difficulty. Even so, such a sentence would not be 
■impossibly difficult for a backtracking parser. (And remember that it would 
be perfectly easy for a true nondeterministic parser.) In fact, the backtracking 
parser does not do exponential work on such an example. 

What of the second sentence? Fodor must claim that such a case causes 
little or no backtracking, relative to garden path sentences. But here too, a 
backtracking parser must do a lot of work: before that John liked we call for an 
embedded Sentence instead of a relative; similarly before that Bill. When we 
get to would we must backup. First, we unwind to that Bill and try a relative 
clause reading for it. This fails. Then we backup to the next previous choice 
point, and try alternative categorizations for like. Finally, we arrive at the 
choice between a relative and an embedded S just before that John liked. 25 
Roughly the same backup takes place here as with the “real” garden path. 

Of course, there might be some other parsing scheme to get us out of this 
particular dilemma. The problem is that any general scheme to make back¬ 
tracking easy will almost necessarily make the garden path sentences easy as 

24 A “pure” ATN does not. even look at the current word it is scanning in order to make 
a guess about what, to do next. But this means that even very simple sentences such as 
He careful involve extensive backtracking, because the machine guesses that it will see a 
declarative sentence, then a question, and so forth. This alternative would simply make 
our point even more strongly, so we won’t adopt it. 

25 Using standard ATN techniques, preference for one type of phrase typo rather than another 
can be encoded by ordering the arcs that leave a network state. One can order the arc 
alternatives so as to take a relative clause push after that , but then this will be wrong and 
fail to account for the preferred ernbedded-S reading of they tM the students that John liked 
the story. 



well. At. heart, a backtrackin'' parser backtracks, and it is quite difficult to use 
ad hoc cost measures to make it perform otherwise. 

Deterministic parsers can recover from garden paths 

Let’s now turn to the second point, about deterministic parsing and error recov¬ 
ery. While Fodor wants the flexibility to simulate determinism when needed in 
her own model, she denies flexibility for a deterministic parser to recover from 
garden paths: 

The only difference between a deterministic parser and a nonde- 
torministic parser is that in the former a garden path analysis is 
permanent and unrepairable, while in the latter garden paths can 
occur and be recovered from during the parse. (Fodor, page 18) 

But again, as Fodor acknowledges in her footnote 20, this is not to deny 
that there could he specialized deterministic recovery procedures for garden 
path sentences, as suggested by Marcus (1980). For these procedures to apply, 
wc would of course toe the hue of determinism: backup along the lines suggested 
by Fodor (or in an ATN) would not be permitted. Ideally, following Marcus’s 
definition, the recovery procedure should otdy be allowed to add information 
about the parse, not wipe out what has already been learned. Instead, when the 
parser blocks (because no known rule applies), a recovery procedure could look 
globally at the state configuration of the parser. Then, by slightly rearranging 
existing subtrees of the parse, the recovery procedure should simply add new 
information about the sentence analysis and come up with the correct sentence 
structure. 

Interestingly enough, the Marcus design, slightly modified, provides the in¬ 
gredients of just such a theory of garden path sentence recovery. We can only 
sketch the basic idea here. 

Let us consider again the horse raced past the barn fell. When a Marcus-type 
parser fails on such a sentence, it is reading fell. But there is much information 
in its machine configuration— its pushdown stack and input buffer--of value 
for error recovery. It is possible to design a natural recovery procedure that 
uses this information deterministically to build the correct output, though at 
some cost. For example, in the horse raced example, one need only insert 
a new S boundary between horse and raced. There is also room within an 
evaluation metric of recovery to differentiate between difficult, garden paths and 
easy-to-analyze sentences with interpretations. Barton and Berwick (1985) give 
some of the details. Contrary to what Fodor asserts, recovery is possible in a 
deterministic machine. 

3.2 A two-stage design? 

Fodor also takes issue with our division of parsing labor into separate tree- 
building and indexing stages. Again, she makes two basic points: first, that this 




division is not motivated on grounds of computational efficiency; and second, 
that this division is not motivated by the grammar (so that we are violating our 
own assumption of transparency connecting grammar and parser). Again, we 
disagree. 

Consider computational efficiency. Fodor first claims that computational 
reasons alone can’t motivate the bounded-context character of our parser: 

Given that, the efficiency results for bounded context-parsing are no 
better than for LR(k) parsing in general, the crucial assumption that 
the first stage of B&W’s parser is a bounded context device receives 
no support from these efficiency results. (Fodor, page 41). 

But as Fodor herself notes, computational complexity calculations are often 
relative to representational issues. If one picked some other representational 
format, then certain computational issues can become irrelevant. For example, 
if we adopt true nondeterminism, then it is not difficult to parse any sentence of 
a context-free grammar, no matter how ambiguous, in time proportional to the 
square of the grammar size and the cube of sentence length (where the grammar 
is measured in terms of the total number of grammatical symbols, like NP and 
VP, not just rules. See Earley (1968)). 

This being so, one cannot divorce a discussion about computational effi¬ 
ciency from representational format. We have chosen to represent the parser’s 
knowledge transparently , that is, to include only those categories sanctioned 
by the grammar. The categories of our grammar include only the basic lexical 
projections NP. VP, PP, and so on. 50 By saying that our parser works transpar¬ 
ently. we mean that the parser’s rules can only make reference to these literal 
symbols. To put the same point another way, transparency requires that the 
only states the parser has are the “states” — i.e., the nonterminal names—that 
the grammar has. The parser cannot use any derived facts about the grammar; 
nor can it appeal to nonterminal symbols that do not otherwise exist. For ex¬ 
ample. the parser cannot create a new state in order to “remember” that a wh 
phrase has been encountered earlier in the sentence. This would correspond to 
a complex nonterminal name such as WH/NP. 

In general, LR(k) parsers are allowed to create such states whenever they 
are needed. These states (in the form of a finite-state control table) encode the 
set of possible left-most derivation patterns for the given grammar. Since they 
represent derivation regularities, these states need not map in a 1-1 fashion to 
the nonterminal names of the grammar, and in fact the wh sentence example 
shows that, in some grammars the nonterminals do not match the states of the 

2 '’Like most syntactic theories since Aspects of the Theory of Syntax, we also include traditional 
agreement features like Person, Number, and Gender, as properties of loxicitl projections. 
We explicitly do not include the “slash” feature of Generalised Phrase Structure Grammar 
(resulting in complex categories like VP/NP), since this feature is not lexically projected 
(X H or lexical items arc specifically barred from having “slash” features in GPSG). 


parsing machine. 27 However, we have specifically barred the use of parsing 
states that do not correspond to lexically projected nonterminal names. There¬ 
fore, our approach does not admit the entire class of Lit (k) parsers. Instead, 
our parsing rules can make' reference only to grammatical symbols. There is a 
class of deterministic parsers that defines such a class of machines, namely, the 
bounded-context parsers. 28 This is the parsing design we have adopted. 

Fodor is correct that general computational grounds do not force the bounded- 
context choice on us— but that is trivially so. For example', if we adopted a 
more powerful device, such as a nondcterministic device, we would not need 
this structure. But, all other things being equal, it is the stronger assumption. 
Transparency is stronger, because we need not posit any entities beyond those 
the grammar already gives us; and all other things ;ire equal, because in this 
case “all other things” is simply parsing efficiency and an account of the psycho¬ 
logies facts about parsing unbounded dependencies. 29 It is of course true that 
a parser need not respect the representations provided by the grammar. But it 
is simpler to assume that it does. A grammar that contains just projections of 
lexical items is smaller, simpler, and hence easier to learn than one that does 
not. There’s a sense in which such a parser is completely lexically based —there 
are just projections of lexical items, and nothing more. 

Fodor also argues that transparency itself does not motivate a literal bounded- 
context parser, because the grammar contains rules that mention variables: “as 
long as the transformational rules of the competence grammar can contain vari¬ 
ables (explicit or implicit) we would expect parsing rules employing the same 
metalinguistic vocabulary to do the same.” She concludes that we need “an ex¬ 
plicit prohibition against variables in the parsing rules.” (Fodor, page 47). But 
again, there are two parts to any computational operation: the procedure itself, 
and the data structure or representation it works on. In this case, there are 
no variables because there are no complex category symbols, and because the 
rules of the machine are finite. As Fodor notes, these are indeed “stipulations” 
(page 48) one must always assume something in arguments about computa¬ 
tional matters, since we don’t have the luxury of neurophysiological findings. 

27 T1 i is transparency distinction also shows up in the way that LTt(k) parsers are built. The 
usual approach is to process an LR(k) grammar to derive a finite-state control table that is 
actually used for parsing. The states of this table need not. and usually do not, correspond 
in any transparent, way lo individual nonterminal names. Instead, in effect they stand for 
theorems about derivations in a particular grammar. By banning such nontransparency, we 
are banning such preprocessing. 

23 Soe Floyd (1964). Actually, we must, define an extension of the bounded-context parsers that 
uses nonterminal lookahead as the Marcus machine docs. For details, see Berwick (1985). 
We could also vary other details of Hie bounded-context design, as long as we retain the 
key feature: parsing rules must refer only to grammatical symbols, not to parsing states. 

20 To make the same point in reverse, the only evidence for the more powerful machinery of 
a hold cell or “slashed'’ categories seems to be the ability to parse unbounded dependen¬ 
cies. But. if this can be explained without, resort to such machinery, then this leaves its 
justification unestablished. 




Similarly, Fodor “stipulates” that a grammar allows machinery beyond basic X 
categories, and that the parser includes backtracking as a standard feature. The 
question is how natural these stipulations are. In fact, in Government-Binding 
theory, the rule Move a does not have variables (Chomsky 1977, 1981 is quite 
explicit on this point). Deletions, on the other hand, can have variables, but 
this is not relevant for parsing because deletions are locally unambiguous (see 
the previous section on Gapping and Berwick and Weinberg (1984)). 

Beyond this question of bounded-context parsing, Fodor then goes on to 
question our division of parsing into two stages at all. She again claims that we 
violate our own criterion of transparency and that such a division is not needed 
on grounds of efficiency. 

The efficiency counterargument, at least in one form that Fodor gives, goes 
something like this. Our second stage procedure that computes referential 
dependencies—that John and he may denote the same person in sentences like 
this: 


John, believes that Fred thinks that Sue said that he; is smart. 

Since this procedure, whatever it is, must be able to search unbounded 
domains, why not just let it do the job of searching for the antecedent of a wh 
phrase? Alternatively, why not just fold the two stages together, combining both 
jobs into one? In effect, Fodor wants to “multiply out” the two representational 
levels we have distinguished into a single one because this is more efficient. 30 

Since Fodor elsewhere (Crain and Fodor 1984) has herself argued for the 
computational benefits of nomnodular representations, it is worthwhile to see 
just what is at stake here. Fodor’s support for nonmodularity is surprising. 
First of all, from the standpoint of computer science generally, it cuts against 
the grain of all that is known about the efficient solution of complex problems. 
(See, e.g., standard works on algorithms, such as Kmith, 1973; Aho, Hopcroft 
and Ulhnan, 1974.) Second, the key point is that for modularity to work the 
distinct levels should have different representational properties, because each 
is designed to highlight different aspects of the same problem. This is the 
source of the power behind the idea of two levels of representation, words and 
phrases. It is easier to state the facts about agreement if we use Noun Phrases 
and Verb Phrases rather than simple words, because then we have just two 
simple representational units adjacent to one another (NP next to VP). In fact, 
a simple finite-state automaton suffices, given that the phrases arc constructed 
first. Similarly, there are facts about language that are more easily stated in 
terms of a linear arrangement of words— e.g., that a Determiner precedes a Head 
Noun, and may agree with it. This (oversimplified) factored representation 

30 At times, Fodor suggests just the opposite, ns when she proposes that the first and second 
states ought to divide computational labor between them: “the first stage device might 
call on the second-stage device to do the antecedent check prior to trace postulation. This 
might call for a slightly more complicated routine to pass control hack and forth between 
the two, but, the labor saved could very well compensate.’’ (Fodor, page 43) 


can be modeled as a cascade of finite-state transducers, where the first level 
system, that of words, builds a phrasal representation and feeds the second 
level. Is it possible' to collapse these two levels into one? Yes: one can “multiply 
out” all combinations of words and eliminate the phrasal level, by forming the 
product of the two finite-state machines representing each level (see Berwick 
1982). However, it does not make sense to collapse these two levels into one. 
The collapsed representat ion is much larger, because all possible combinations 
of constraints, previously independently expressed at each level, are now written 
out explicitly. The resulting system is much larger. In general, if the constraints 
on one level can be expressed by a machine of size n, and the constraints on 
a second level can be expressed by a machine' of size rn, then the collapsed 
machine could be of size mn. si In fact, this is one traditional argument for 
a multiple-levels view of language, as initially expressed in Chomsky’s Logical 
Structure of Linguistic Theory. There are two computational advantages to the 
modular view: one, just mentioned, is that the resulting system is easier to 
learn, if we equate smaller size with easier learning; the second is that we can 
design computational procedures tailored to work with the specific formats of 
each level. 

This is exactly what we aimed for in our two-stage model. Each level has a 
different representation that highlights different aspects of the computation of 
linguistic structure, and each is designed to ease the computation of properties 
relevant to that level. The first level deals with questions of how to. build a 
tree, and uses notions like dominate, precede. For example, in the sentence 
example we gave just above we expand the tree in exactly the same way no 
matter whether he is bound to Fred or whether it is a free pronoun bound to a 
discourse NP that occurred much earlier. This contrasts with cast's governed by 
Subjacency. The presence or absence of an antecedent tells us how to expand 
the tree we are building. If there is an antecedent in the structure mid a verb 
that selects or subcategorizes for an NP, we create a trace slot in the phrase 
structure; otherwise, we do not. This is a decision about tree structure. 

Roughly speaking, referential dependencies can cut across sentences and 
involve all the objects mentioned in a discourse—plainly outside the purview 
of sentence tree predicates. Secondly, referential dependencies are calculated 
on a different representational base from phrase structure, just as Subject-Verb 
agreement is calculated at the level of phrases rather than words. 

What would happen if we tried to collapse the referential dependency calcu¬ 
lation together with tree-building is exactly what would happen if we tried to 
compute Subject-Verb agreement at the level of words. As we show in our book 
(Berwick and Weinberg 1984), our first stage procedure works in linear time, 
in time cn, where c is a constant depending on the size of the output phrasal 
structure and the size of the grammar, and n the length of input sentences. 

3 ’For more realistic representational formats, e.g., context-free grammars, the savings can be 
even larger. See Berwick 19S2 for details. See tbo next, section for additional comments on 
tliis problem and grammar size. 



The search for referenda! antecedents would now have to look at a represen¬ 
tation defined over complex tree shapes, including many irrelevant structures. 
We note in our book that in the worst case this would increase analysis time 
to kn 2 , where n is the length of the input sentence, and k is some constant 
that depends on the size of the phrase description. It is already apparent that 
pronoun referential dependency can extend across sentences. It is also apparent 
that this computation can he nonlinear: consider the laborious calculation that 
seems to occur when one uses a pronoun whose antecedent lies many sentences 
behind in a discourse. What Fodor wants to do by combining these two steps 
is make the first stage procedure nonlinear as well. But as she herself notes 
(page 68: "‘in generffi, linear time parsing is surely just what a model of the 
human sentence processing mechanism should aim for”), this would have the 
unfortunate effect of making the construction of tree structure for single sen¬ 
tences potentially nonlinear. We want to avoid this. We would like to recover 
the right tree structure in linear time, even if the pronoun antecedents are not 
in place. Note that there is much we can interpret about a sentence if we have 
its correct phrase structure, even if we do not know that he is dependent on an 
earlier NP. Fodor’s collapsed scheme in effect forces the machine to stop and 
wait for the right antecedent calculations to complete before plunging oil. 32 

By factoring apart the stages ol tree-construction and referential dependency 
calculation, wc gain at the second stage as well because the. size of the structures 
the search procedure works over can be made smaller. That is, instead of 
running our procedure in time cn 2 , where c is large, we can run it in time 
kn 2 , where k is a short list of NPs. As we noted in our book, this is a difficult 
argument to make because in most cases sentences arc short. But let us see what 
it means in detail. The second-stage representation includes shunted predicates 
and NPs. It is a simple matter to take this propositional representation and 
build a finite-state transducer (standing for a homomorphism) that projects just 
the NPs from this second list. We may imagine this projected bag of NPs to 
be the discourse NPs for this sentence; it could include, perhaps, the NPs for 
previous sentences—but just NPs. It is because we have now isolated these 
unifs on a separate level that the search for referential dependents is easier. No 
other units stand in the way of a direct search through the NP list. In most 
cases, there will be only a few NPs to look at. Note that this method only works 
because we have set up the first stage to build just the right structured list so 
as to provide the right NPs to look through. Further, in those cases where 
the list is large, we expect to find nonlinear processing difficulty informally 
at least, precisely what seems to happen when there are many potential NP 
antecedents. 33 

32 One could design a “pipelined” scheme where a second-stage referential dependency calcu¬ 
lation works off the input from a first-stage device. But this is just our two-stage model in 
another guise. 

33 That is, a linear list of this kind, if long enough and if it included discourse NPs, might 
take linear time to search for any single NP. Of course, there are other possibilities, since 



To summarize, wo argue that isolating the referential dependency calculation 
in this way pinpoints an important functional distinction between building tree 
structure and referential dependency. Tree construction is fast (linear time, and, 
in fact, realtime if one examines our procedure in detail): each phrase is built in 
a hounded amount of time; coindexing (or referential dependency calculation) 
does not interfere with this, for it can he nonlinear. Fodor’s proposed one-stage 
model, because it interweaves these functionally distinct processes, slows both 
down. 


3.3 Another source for locality principles? 

Finally, Fodor contends that locality principles could be motivated in a GPSG- 
type theory, both on grounds of easy payability, and another point that we 
ourselves note on grounds of learnability: 

This negative result does not mean that subjacency could not be 
functionally grounded in a GPSG. As chapter 3 observed, there are 
many possible “functional” constraints that could have played a role 
in the shaping of language. Foremost among these, at least tradi¬ 
tionally, is learnability. (Berwick and Weinberg 1984:166) 

Fodor makes two specific proposals along these lines, one for payability, and 
one for payability /learnability. Let's take each in turn. 

Consider first her argument that a GPSG parser would benefit from locality 
constraints resolved by context on the right , in sentences such as Who did you 
help ..., where the parser must decide whether to insert a trace after help 
or keep going so that the trace will appear in some lower complement. But 
once again, this constraint just doesn’t matter under the true nondeterministic 
model. Advocates of GPSG often cite the parsing results for general context-free 
grammars as evidence that such a system will work efficiently. But then, Fodor’s 
demand for constraints on context become more mysterious. Suppose one uses 
Earley’s parser for context-free grammars. This is one standmd algorithm on 
which the efficiency results for generalized phrase structure grammar me often 
based. Then all parses arc kept in parallel, and there’s no problem at all: both 
alternatives are carried along, and when the problematic gap appears or fails to 
appear, one of the possibilities falls by the wayside. There is no reason that the 
locality constraint must exist. The point is not that the GPSG parser cannot 
be made to benefit from a locality constraint but that it doesn't need to benefit 
from a locality constraint in the right-context situation. 34 

not much is known about the representation of semantic structures. For example, it could 
be that such NPs can be accessed in constant time, up to a certain memory limit— as if one 
could instantly remember the last 10 things mentioned. If so, then processing difficulties 
might not. show up on short sentences. Like so many other details about, processing, this 
one hinges on representational questions that vve cannot answer in detail as yet. 

3 ‘*Alternatively, one could dispense with the Earley algorithm and come up with some other 
parsing algorithm for these systems. But then it, remains to establish that this alternative 




Wliat about our trace-based j>arsor, then? Why can’t we add similar par¬ 
allelism and thus avoid the need for a locality constraint? Remember that our 
parser design does not have complex categories such as S/NP, VP/NP, and so 
on; it can use just the unalloyed categories provided by X theory. It does not 
use a hold cell, or any other special memory. Given these transparency con¬ 
straints, it is interesting that while true nondeterminism will make a locality 
constraint for right-disambiguating contexts superfluous, it actually leaves the 
demand for Subjacency unscathed. Consider what happens if we had a true 
nondetcrmiuistic, trace-based analysis of sentences such as. What did Mary say 
... that John ate?. Note that the analysis is completely determined up to the 
point that, the ‘'yap” after eat is encountered. That is, the parser is not car¬ 
rying along two analyses at this point, as it is in the right-context case. At 
ate the parser takes the nondeterministic solution: it writes out one parse with 
the trace inserted, and one with it not inserted. But now what? The sentence 
ends. No additional information is forthcoming, and yet there are still two vi¬ 
able analyses of the sentence. One of these is grammatical (where the trace is 
inserted) mid the other is not. ambiguous. But the sentence is not interpreted 
as having two analyses, one grammatical, one not. There is no evident way to 
force the other reading out. Thus, the nondeterministic analysis actually makes 
things worse here: it yields two candidate interpretations when only one will 
suffice. To resolve these, we must now rescan the output analysis tree, to pick 
up whether a wh was present - adding to the computational cost. Right-context 
won't help us here, because there is no right-context. But there’s no evidence 
that this roanalysis occurs, or that such a sentence is hard to process. We con¬ 
clude that nondeterminism does not help us if we have only the categories S, 
NP, VP, etc. and no Subjacency ; on the contrary, it hurts. Thus, Subjacency 
is still predicted in our model, unlike Fodor’s. Note that this is quite unlike 
the right-disambiguating context ease, where pursuing alternatives in parallel 
allowed us to hold off making a decision until information became available. 

What about the second proposal, about learning? Just before her conclusion, 
Fodor suggests that a GPSG system might need locality constraints to make its 
rule system smaller, hence more easily parsable, tuid, as suggested in the other 
papers where she has advanced this proposal (Fodor 1984) more learnable. 

In the absence of any details about just how easy or hard it is to parse a 
full-scale derived rule system, it is difficult to judge this proposal. We must 
first emphasize that Fodor here is talking about a grammar that explicitly lists 
possible phrase structure patterns rule by rule. This is rather different from the 
current GPSG framework that represents a grammar via a set of dominance and 
precedence statement,s (ID/LP format) for basic phrasal relationships, implica- 
tional statements to encode feature redundancies, and metarules to account for 
systcmaticitios like active-passive sentences (Gazdar, Klein, Pullnm, and Sag, 
1985). What one finds is that in any reasonably full-scale grammar, for, say, 


parsing method—whatever it is—is efficient. Fodor docs not offer a concrete alternative. 




English, tlic exj>licit rule system is so large that there’s only marginal gain in 
“reducing” the size of an explicit rule system in the manner Fodor suggests. 
This is because the reduction is miniscule compared to the total overall size of 
the rule systems themselves. Let’s see why this is so. 

To begin, we must be precise. Since Fodor wants to make an argument about 
improving parsing efficiency by reducing grammar size, ltd. us define grammar 
size, as the total number of symbols in the grammar accessed for parsing. 
This is the standard measure. (See Earley 19C8 for discussion.) We do not 
want, to use the total number of individual rules of the grammar, because this 
would weight against rule systems with “short” rules (e.g., A—>B0; 11 »DEF as 
opposed to A->DEFC). 

Let us now compare the grammar size of an explicit phrase structure rule 
system that allows a one-S extraction constraint vs. one that allows extraction 
across three S’s. Elsewhere (Fodor 1984), Fodor has suggested this ;is an exam¬ 
ple of the benefits of constraints: the tighter the constraints on extraction, the 
fewer the rules. While this is literally true, the problem is that such a gram¬ 
mar is already so large that any minor effect imposed by one new constraint is 
swamped out. 

It is of course quite difficult to know what the “true” grammar size for 
such a system is, because we do not know what the “true” grammar of any 
natural language is, even of English. However, we can say this much: any such 
,/'""’V explicit rule system must have a rule for every possible surface phrase structure 

pattern. How many such patterns arc there? Perhaps the most systematic study 
of such patterns has been carried out in the context of Sager’s work (1981). 
For instance, Hobbs (1974) estimates that a subpart, of the Sager grammar, 
when expanded out into a context-free form, would be “about several orders 
of magnitude larger” than the 200 productions and 300 context restrictions it 
contains in context-sensitive form (1074:132). That is, the expanded grammar 
size would he have about 20,000 GO,000 context-free rules . 35 We take this as a 
fairly conservative estimate of the number of explicit, rule-by-rule descriptions 
of phrase structure patterns in English. j0 

The Earley algorithm runs in time at most |(7| 2 n 3 , where n is the sentence 
length in tokens. That is, using the Earley algorithm with a fully-expanded, 

36 The initial grammar’s productions are in Chomsky normal form, and therefore have a size of 
3 per production. Tims the initial grammar size is about COO, with 300 context restrictions. 

30 Note that most grammatical descriptions that appear in the computational literature in 
fact describe only small fragments of natural languages—quite reasonably, since they are 
often designed to illustrate one or another theoretical point, or work within a sublanguage 
that serves some functional end (like database retrieval); they are not designed for broad 
coverage. For instance, the example GP8C system described by Gawron, King, Lamping, 
Loebner, Paulson, Pullum, Sag. and Wasow, 1982 for database retrieval has an expanded 
grammar size, of about 1500 1800 (1982:77), hut does not include many sentence types and 
restrictions of the Sager grammar. For instance, appositives and sentence adjuncts of many 
different, types ;ire not included (little did she know that ...; Whatever you say, the guy, the very 
same person you saw yesterday, is ...). 
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explicit rule system for English, the running time would be at worst 1.6 X 
10°n 3 , or about a billion Xn 3 . The result is that any change brought about by 
introducing a constraint on extraction across one S rather than, say, three, is 
irrelevant. The base grammar with three-S extraction will need two or three 
extra nonterminal symbols, in order to “count" how many S’s have been crossed 
(Sx, S 2 , S 3 ). Suppose this adds 50 new rules. What happens to parsing time? 
It is “exploded” from 1.5 billion n 3 to 2.-1 billion n 3 - an increase, to be sure, 
but one that cannot possibly matter, because the constant factor is already so 
large. 

We do not, mean to take this as a serious calculation; it is quite speculative. 
However, the quantitative point still stands. This exercise is simply designed 
to demonstrate that an explicit rule system doesn’t, exhibit the right kind of 
demarcation between one and more than one that is so characteristic of natural 
languages. Details about grammar size aside, if extraction across two domains 
does not lead to a processing burden, then it is hard to say why three rather than 
four or five domains does. Any system grounded on explicit phrase structure 
rules does not naturally distinguish between a locality condition that acts over, 
say, three domains and one that acts over a single domain. We just saw that there 
could be no relevant difference for parsing, or for learning (if we equate size of 
rule system with difficulty of learning). But we suspect that this simply misses 
an important property of natural grammars; namely, that they do not have 
“counting” predicates that distinguish between two or three, or 17 domains. 
This is evidently a property of grammars generally, and has some power in 
explaining the metrical structure of phonological rule systems (see Halle and 
Vergnaud forthcoming ,1985). But, why do grammars have this property? If we 
assume that rule systems arc written in a derived fashion, as Fodor insists, then 
there is no reason for it. A grammar that counts to 16 is just as easily parsed 
and just as easily learned as one that does not. 

Suppose, in contrast, that there arc no phrase structure rules—no explicit 
derived rules at, nil. Instead, suppose that, there arc just individual lexical items 
and their feature projections (as defined by X theory), plus the movement rules 
and constraints defined by G D theory. Now there cannot be any rule of grammar 
that cuts across just three S domains. Individual lexical items can subcategorize 
for single S’s, and hence build phrases consisting of adjacent S domains. Since 
movement can apply, we can move elements across these domains. Cyclicity 
(iteration of this process) leads to superficially unbounded movement. But no 
other constraints car. even he stated. The vocabulary for writing down grammars 
cannot refer to phrase structure rules, and so cannot write down a chain of 
three S expansions to allow extraction across three S’s but not four. As we 
observed in our book, either free (unbounded) movement is possible, or else 
movement across a single category is blocked; nothing in between is allowed. 
This result the noncounting evidently true of natural grammars. follows from 




the nonexistence of derived phrase structure rules. 37 

Of course, nondeterminism and the flexibility allowed in writing derived 
grammars leaves open many possibilities. As we have seen, this is exactly what 
is wrong with a weak set of hypotheses: it loaves open too many avenues to 
explore. As we said at the outset, we prefer to tackle the problem head on, 
by adopting strong constraints that load to interesting predictions and expla¬ 
nations of why natural grammars arc built the way they are. giving up those 
constraints only when absolutely necessary. So far, we’ve been encouraged by 
the results. Our predictions about locality principles, suitably revised, bold up. 
Our modular design leads to testable hypotheses about the role of c-command 
in language processing, now being probed (Weinberg and Garrett, forthcom¬ 
ing). Our transparency assumption leads to noncounting grammars. We see no 
reason to abandon the chase now, when we have come so far. 


37 As far as we can tell, this property also holds in current GPSG frameworks that avoid 
explicit phrase structure rules and use subcatcgoruuition ami ID/'LP st •dements instead 
to define a set of admissible phrase structures. Tims this version of GPSG also obeys 
uonrounting. 
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